Non-equilibrium dynamics of gene expression and the Jarzynski equality 
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In order to express specific genes at the right time, the transcription of genes is regulated by the 
presence and absence of transcription factor molecules. With transcription factor concentrations 
undergoing constant changes, gene transcription takes place out of equilibrium. In this paper we 
discuss a simple mapping between dynamic models of gene expression and stochastic systems driven 
out of equilibrium. Using this mapping, results of nonequilibrium statistical mechanics such as 
the Jarzynski equality and the fluctuation theorem are demonstrated for gene expression dynamics. 
Applications of this approach include the determination of regulatory interactions between genes 
from experimental gene expression data. 

PACS numbers: 87.16.Yc 87.10.Mn 87.16.dj 



Cellular dynamics is based on the expression of specific 
genes at specific times. The control over gene expression 
is a crucial feature of nearly all forms of life, as it allows 
an organism to respond to changing external and inter- 
nal conditions. With perfect regulatory control, only the 
DNA of those genes whose products are required at a 
given instant would be transcribed to m(essenger)RNA 
molecules. These mRNA molecules are in turn translated 
to proteins. For example, enzymes to break down nutri- 
ents are produced only when nutrients are present, or 
repair proteins are assembled to respond to DNA dam- 
age. 

To initiate the transcription of a gene, specific 
molecules, called transcription factors, locate and bind 
to DNA near the starting site of a gene. These molecules 
attract and activate an enzyme which reads off DNA, 
producing an RNA chain molecule according to the DNA 
template. Transcription factor molecules are themselves 
proteins and thus subject to regulatory control, through 
other transcription factors, or through themselves. As 
a result, mRNA and protein concentrations of different 
genes may have highly non-trivial interdependencies. A 
prominent example is the spatial-temporal evolution of 
protein concentrations in the early stages of embryonic 
development, leading to the formation of the body plan 
of an organism [l| . 

Despite the need for stringent control, gene regulation 
is an inherently noisy process Q. At the level of single 
cells, only few molecules are involved, with single events 
potentially having a large impact 

In this paper, the dynamics of mRNA concentrations 
in synchronized cell populations is studied. The simplest 
model for the concentration x(t) of a given mRNA is [4|, 

ai 



d t x = -t]x + f + Vd £(t) 



(1) 



where rj is the decay constant of the mRNA molecule 
and / is the average rate at which new molecules are 
produced by transcription of the corresponding gene. 
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FIG. 1: Transcription and mRNA decay, a) Transcrip- 
tion of a gene is controlled by the binding of transcription fac- 
tors (left, shown in green) to the regulatory region of a gene. 
Transcription of a gene leads to the production of mRNA 
molecules at some rate /. mRNA molecules decay at a rate 
n per molecule, b) The resulting dynamics of mRNA concen- 
tration x can be mapped onto an harmonic oscillator subject 
to a restoring force — nx and an external force / driving the 
system out of equilibrium. 



The term £(t) describes all other processes, including 
changes in the transcription rate due to changing tran- 
scription factor concentrations. Their influence has been 
modeled by a random uncorrelated variable with mean 
zero and covariance = S(t — t') Equa- 

tion JTJ is well-known as the Langevin-equation of an 
Ornstein-Uhlenbeck process describing the motion of an 
overdamped particle with position x in a quadratic po- 
tential V(x) = {-qx - f) 2 /(2r]) Q. A thermal bath with 
inverse temperature (3 — 2/ D given by the Einstein re- 
lation exerts a random force leading to an equilibrium 
solution P eq (x) ~ exp{— (3V(x)}, see Fig. [TJ 

We probe this equilibrium scenario using experimental 
measurements [|[ of expression levels of all yeast genes 
taken at discrete intervals At 



3 11 ] . In order to allow com- 



parison across genes, we rescale the expression levels x of 
each gene using q = ^/2/(Drj)(r]x — f) so the distribu- 
tion of q in equilibrium is P(q) ~ exp{— q 2 /2}. The pa- 
rameters rj, /, D for each gene were determined by max- 
imizing the likelihood Vnj.D(x) of the expression levels 
x = {x(t)} with respect to the free parameters. The 
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likelihood 7\/,_d(x) = Y\J = i G v j tD (x t+A \x t ), where 
G v j,D(xt+A\xt) = V2 ^ DA exp {-jt(d t x + j]x t - f) 2 } 
is given in terms of the short-term propagator of the 
Langevin equation (TT]). Drift and diffusion under this 
propagator can be compared in detail with the experi- 
mentally measured expression levels Q . 

Figure [2] shows the distribution of rescaled expression 
levels q across all genes and times. While the observed 
distribution P(q) is roughly compatible with the equi- 
librium Gaussian distribution, the statistics of expres- 
sion levels is not stationary. As an example, we consider 
the set of target genes of a transcription factor called 
Swi4 [32]. The average value (<z(i))swi4 of the target 
genes at different times varies over the experimental time 
course, and these average values are correlated with the 
expression levels of the transcription factor Swi4, see in- 
set of Fig. m 




FIG. 2: Empirical statistics of gene expression levels. 

The set of (rescaled) expression levels of all ast genes at dif- 
ferent times along the cell cycle has a distribution roughly 
compatible with the equilibrium distribution of the Langevin 
equation (fl]) (solid red line). Deviations at high and low ex- 
pression levels might in principle be due to non-linearities of 
DNA hybridisation to probes. Inset: However, the distribu- 
tion of expression levels is not stationary, but changes with the 
expression level of transcription factors. Here the mean ex- 
pression levels (q(t))swi4 of Swi4 target genes at a given time 
t are plotted against the expression level y(t — At) of their 
transcription factor Swi4 at the preceding measurement. The 
mean expression level of target genes is positively correlated 
with the expression level of the transcription factor, which 
changes continuously over the cell cycle. 

This result is not unexpected: mRNA and protein 
concentrations of transcription factors change on the 
same timescales as the concentrations of products of other 
genes. Rather than the rapid fluctuations of the stochas- 
tic term in the Langevin equation (JTJ, the effects of tran- 
scription factors on their targets is a driving force with 
a dynamics on the same timescale as that of the target 
genes. In consequence, mRNA concentrations are kept 
out of equilibrium. 

These observations call for a non-equilibrium approach 
to gene expression dynamics, which is the subject of this 
Letter. The non-equilibrium regime is characterized by 
changes in the statistics of gene expression levels over 



time. These are correlated with the expression levels of 
the corresponding transcription factors. We model the 
dynamics of mRNA concentration by the driven Langevin 
equation 



(2) 



d t x=-r)x + f(y) + y/DZ(t) , 



with the transcription rate f(y) depending on the con- 
centration y of a given transcription factor at time t. 
This equation can easily be generalized to describe the 
effects of several transcription factors. The stochastic 
term £(t) characterizes all processes not yet described 
by f(y, . , .). In this sense, flS)) serves as a first starting 
point towards an increasingly deterministic description 
of mRNA dynamics. In the following, we will neglect 
post-transcriptional regulation and take the mRNA ex- 
pression level of a transcription factor as a proxy for its 
protein concentration (Io| . 

The equation of motion for the mRNA concentra- 
tion J2]) describes an overdamped harmonic oscillator 
subject to an external force f(y). Thus the dynam- 
ics of transcription factor concentration y(t) results in 
a time-dependent external force f(t) = f(y(t)). In the 
picture of a particle moving in a quadratic potential, 
V(x, t) = (wx — f{t)) 2 /{2rf) now is a time-dependent po- 
tential whose origin changes with time. With each change 
of the external force A/ t = ft — ft-i, with each change 
in the potential, work is performed on the system. The 
total work performed by the external force f(t) between 
initial and final point of the time course is denoted W = 
Y% AW t) with AW = (dV/df) x Af = -(r)x-f)/ri A/. 

The work W quantifies the coupling of changes in the 
transcription factor concentration to the mRNA concen- 
tration of a target gene and serves as the central measure 
of the non-equilibrium approach. To evaluate this quan- 
tity, we determine f{y) within a simple model of tran- 
scriptional activation: the probability of a transcription 
factor being bound at a given binding site in the regula- 
tory region of a target gene depends on its concentration 
y, binding energy e, and the free energy T of the tran- 
scription factor in solution or bound elsewhere ll| . This 
model gives 



f(y) = fo 
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(3) 



assuming the transcription rate to depend linearly on the 
probability that the binding site is occupied at a given 
time, fo is a basal transcription rate in the absence 
of transcription factors and S quantifies the change of 
the transcription rate due to transcription factor bind- 
ing. The functional form ([3]) is the celebrated Michaclis- 
Menten kinetics, first studied in the context of enzymatic 
reactions nearly a century ago [12| and used widely in 
transcription modelling [13j . The free parameters of the 
model ([3]) are inferred for each gene from its mRNA con- 
centration trajectory as above. 
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Fig-EK) shows, for different targets of the transcription 
factor Swi4, the distribution of work W performed by 
changes in the Swi4 expression level over the time course. 
The free energy F of the equilibrium distribution of x, 
given by exp{— (3F} — J dx exp{— f3V(x)} = y/nD/rj, 
does not change with /, since changes in the force / only 
shift the origin of the potential V(x). The distribution 
of work for the different genes obeys (W) > AF = as 
required by the second law of thermodynamics. However, 
a small number of trajectories has W < AF. 




FIG. 3: The Jarzynski equality for gene expression, a) 

The target genes of transcription factor Swi4 show a broad 
distribution of work /3W performed by changes in Swi4 ex- 
pression levels, with (W) > AF = 0. Inset: The distribution 
of exp{— f3W} has a mean of 0.96 ± 0.33 compatible with the 
Jarzynski equality, b) A detailed relationship links the prob- 
abilities of paths with positive and negative work performed, 
see text. The main figure shows the relationship for work AWt 
performed between individual timesteps, the inset shows the 
same relationship for the overall work W performed over the 
full time course. 



A remarkable equality derived by C. Jarzynski 14 1 
links the work performed on the system averaged over 
many realizations of the external forcing time course with 
the associated change in free energy, 



(exp{-f3W}) = exp{-/3AF} 



(4) 



For a single trajectory of the system driven out of equi- 
librium by the external force, W is a random number de- 
pending on microscopic details. According to the Jarzyn- 
ski equality, however, the average of exp{— f3W} over all 
trajectories equals exp{— f3AF}. Its use in chemical re- 
action networks has been described theoretically in [l5j |. 



In a living organism, a specific time course of transcrip- 
tion factor concentration is hard to repeat many times in 
order to perform an average over trajectories. However, 
many target genes respond to the time course of the tran- 
scription factor, and each target has a W that is a random 
number which depends on the detailed trajectory, but has 
a mean of exp{— f3W} equal to exp{— f3AF} = 1. The 
inset of Fig. [3^) shows the distribution of exp{— f3W} 
across the target genes of Swi4. It displays a broad dis- 
tribution with mean and standard error 0.96 ± 0.33 in 
agreement with the Jarzynski equality ^ [33j. 

An even stronger statement holds, from which the 
Jarzynski equality follows. Fig. [3}d) shows the proba- 
bilities of positive and negative work P(W) and P(—W) 
to be linked by a detailed fluctuation theorem [3, \v\ 

P{(3W-(3AF = /3w)/P(f3W-/3AF = -/3w) = exp{/3w} , 

(5) 

which shows how trajectories with work less than the 
change in free energy are exponentially less likely than 
those with work performed in excess of the free energy 
change. This relationship can be derived for generic time 
courses involving shifts of the origin of a quadratic po- 
tential [lij ]. Thus the result that a detailed fluctuation 
theorem holds for the work performed by the changing 
transcription factor concentration serves as evidence for 
the linear equation of motion (f5]). 

So far, we have focused on the statistics of mRNA con- 
centration trajectories given the parameters of stochastic 
models like ([2]). The reverse question, namely, what in- 
formation on transcription regulation can be extracted 
from experimentally measured expression levels is an im- 
portant question in systems biology and bioinformat- 
ics lj|, 20, [22| . Some simple attributes are already in- 



herent in the observations of non-equilibrium behaviour. 
For instance, from the example in Fig. [2] one can deduce 
that the transcription factor Swi4 acts as an enhancer of 
transcription, rather than a repressor, since the average 
expression level of its targets increases with expression 
level of Swi4. Similarly, the targets of a transcription 
factor can be determined from the inferred relationship 
f(y) between the expression levels of a transcription fac- 
tor and that of a (potential) target gene. This "reverse 
engineering" of regulatory interactions is particularly rel- 
evant for transcription factors with ill-characterized bind- 
ing sequence, and for factors which do not bind directly 
to regulatory DNA (so-called co- factors) . For all genes we 
compute the range of values of f(y) over the range of y. 
Genes with a large response \ f{y max ) - f(y m in)\ to chang- 
ing transcription factor expression levels are presumed 
target genes. The top ten targets of Swi4 predicted in 
this way are listed in Table HI We test these predictions 
by searching the regulatory regions of the predicted tar- 
gets for copies of the binding sequence [32| . In all but one 
of the predicted targets one finds at least one Swi4 bind- 
ing site. Furthermore, 8 of the 10 predictions have been 
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CDC9 


1 


X 




RAD27 


1 


/ 


RNR1 


1 


/ 




PRY2 


3 


/ 


YG3N 


1 


/ 




CSI2 


4 


/ 


CRH1 


1 


/ 




PMS5 


2 




YI01 


1 


✓ 




CDC21 





✓ 



TABLE I: Predicted transcription factor target genes. 

The top ten predicted target genes of transcription factor 
Swi4 are listed along with the number of Swi4 binding sites 
in the regulatory regions of those genes [3^|. Check marks in- 
dicate existing experimental evidence for a direct regulatory 
interaction [23|]. About 3% of the yeast genes have such direct 
evidence for regulation by Swi4. 

previously found experimentally [23fl. A more detailed 
account will be published elsewhere [9(. 

In summary, we have shown how regulatory interac- 
tions generate correlations between expression levels of 
transcription factors and their target genes. A simple 
mapping to a driven harmonic oscillator depicts the tran- 
scription factor concentrations as an external force, which 
drives the expression levels of target genes out of equilib- 
rium. Central quantity of this approach is the work per- 
formed by the external force. Such dynamic observables 
provide a more detailed fingerprint of the complex bio- 
physical machinery behind gene expression than heuristic 
measures like correlation coefficients. 

It turns out that the work performed by the external 
force is of the same order of magnitude as the temper- 
ature of the heat bath describing stochastic effects, so 
\PW\ ~ 1. Macroscopic systems generally have \PW\ ^> 
1. As a result, experimental observation of the fluctua- 
tions at the centre of the Jarzynski equality and related 
theorems 24j has been limited to the mechanical prop- 
erties of biomolecules [2(| and colloidal systems [27| • 
The correlated dynamics and complex responses of gene 
expression offer a proving ground for stochastic thermo- 
dynamics. Temporal data on other types of molecules 
apart mRNA will lead to new challenges in the non- 
equilibrium dynamics of genetic regulation. 
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